-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Missing Ingestion Jobs from WebUI Table #679
base: main
Are you sure you want to change the base?
fix: Missing Ingestion Jobs from WebUI Table #679
Conversation
WalkthroughThe pull request introduces a new Changes
Possibly related issues
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (3)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
components/webui/imports/api/ingestion/server/CompressionDbManager.js (2)
26-31
: Update JSDoc to reflect parameter changes.The JSDoc comment still references the removed
limit
parameter. Please update it to describe thelastUpdateDate
parameter.- * Retrieves the last `limit` number of jobs and the ones with the given + * Retrieves jobs updated since the given date and the ones with the given * @param {string} lastUpdateDate
62-64
: Consider optimizing the SQL query structure.The current implementation adds the UPDATE_TIME filter to each UNION query, which could impact performance. Consider moving the filter to a WHERE clause after the UNION to improve query efficiency.
- WHERE - _id=${jobId} && - ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= '${lastUpdateDate}' + WHERE _id=${jobId}Then add after all UNION queries:
WHERE ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= '${lastUpdateDate}'components/webui/imports/ui/IngestView/panels/IngestionJobs/IngestionJobRow.jsx (1)
109-115
: Consider using dayjs for consistent date formatting.The component already uses dayjs for duration calculations. For consistency, consider using it for formatting update_time as well.
- text={(null === job.update_time) ? - "null" : - new Date(job.update_time).toLocaleString()}/> + text={(null === job.update_time) ? + "null" : + dayjs(job.update_time).format('YYYY-MM-DD HH:mm:ss')}/>components/webui/imports/api/ingestion/server/publications.js (2)
23-23
: Add a comment explaining the magic number.The constant CONST_FOR_DATE_FORMAT = 19 is used for date string manipulation but its purpose isn't clear. Add a comment explaining that it represents the length of the MySQL datetime format "YYYY-MM-DD HH:MM:SS".
+// Length of MySQL datetime format "YYYY-MM-DD HH:MM:SS" const CONST_FOR_DATE_FORMAT = 19;
48-50
: Extract date formatting logic into a helper function.The date string manipulation logic is duplicated. Extract it into a reusable helper function to improve maintainability.
+/** + * Formats a date object to MySQL datetime format + * @param {Date} date + * @return {string} + */ +const formatToMySQLDateTime = (date) => { + return date.toISOString() + .slice(0, CONST_FOR_DATE_FORMAT) + .replace("T", " "); +}; + -let lastUpdateDate = new Date().toISOString() - .slice(0, CONST_FOR_DATE_FORMAT) - .replace("T", " "); +let lastUpdateDate = formatToMySQLDateTime(new Date()); // In refreshCompressionJobs: - const newDate = new Date().toISOString() - .slice(0, CONST_FOR_DATE_FORMAT) - .replace("T", " "); + const newDate = formatToMySQLDateTime(new Date());Also applies to: 107-109
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py
(1 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
(5 hunks)components/webui/imports/api/ingestion/constants.js
(1 hunks)components/webui/imports/api/ingestion/server/CompressionDbManager.js
(3 hunks)components/webui/imports/api/ingestion/server/publications.js
(3 hunks)components/webui/imports/ui/IngestView/panels/IngestionJobs/IngestionJobRow.jsx
(1 hunks)components/webui/imports/ui/IngestView/panels/IngestionJobs/index.jsx
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
components/webui/imports/ui/IngestView/panels/IngestionJobs/index.jsx (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/webui/imports/api/ingestion/constants.js (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/webui/imports/ui/IngestView/panels/IngestionJobs/IngestionJobRow.jsx (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/webui/imports/api/ingestion/server/CompressionDbManager.js (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/webui/imports/api/ingestion/server/publications.js (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (4)
components/webui/imports/api/ingestion/constants.js (1)
13-13
: LGTM! The new UPDATE_TIME constant is well-placed.The addition follows the existing naming convention and maintains logical grouping with other time-related columns.
components/webui/imports/ui/IngestView/panels/IngestionJobs/index.jsx (1)
50-50
: LGTM! The new column header is well-integrated.The "Last Updated" column follows the existing table design pattern with consistent right alignment.
components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py (1)
58-58
: LGTM! The update_time column addition is well-structured.The column definition aligns with other timestamp columns in the table and is appropriately positioned with other time-related fields.
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
212-219
: LGTM! Consistent timestamp updates across all job state changes.The update_time is properly set at all critical points where job status changes:
- When no tasks are created
- When job starts
- When job succeeds
- When job fails
Also applies to: 233-235, 339-339, 350-350
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
components/webui/imports/api/ingestion/server/publications.js (1)
106-112
:⚠️ Potential issuePrevent potential race condition in lastUpdateDate updates.
The current implementation has a potential race condition where jobs updated between saving newDate and updating lastUpdateDate could be missed. Consider using a transaction or updating lastUpdateDate before the query.
Additionally, consider adding error handling for date operations to prevent potential runtime errors.
#!/bin/bash # Description: Check for potential race conditions by analyzing job update patterns # Search for concurrent job updates in the codebase rg -A 5 "UPDATE.*COMPRESSION_JOBS.*SET"
🧹 Nitpick comments (2)
components/webui/imports/api/ingestion/server/publications.js (2)
23-24
: Document the significance of the magic number.The constant
CONST_FOR_DATE_FORMAT = 19
appears to be the length of a MySQL datetime string (YYYY-MM-DD HH:mm:ss). Consider adding a comment explaining this or using a more descriptive constant name.+// Length of MySQL datetime string format 'YYYY-MM-DD HH:mm:ss' const CONST_FOR_DATE_FORMAT = 19;
45-51
: Extract date formatting logic into a reusable function.The date formatting logic is duplicated between initialization and updates. Consider extracting it into a helper function to improve maintainability and reduce duplication.
+/** + * Formats a Date object to MySQL datetime string format + * @param {Date} date - The date to format + * @return {string} Formatted date string in 'YYYY-MM-DD HH:mm:ss' format + */ +const formatToMySQLDateTime = (date) => { + return date.toISOString() + .slice(0, CONST_FOR_DATE_FORMAT) + .replace("T", " "); +}; + -let lastUpdateDate = new Date().toISOString() - .slice(0, CONST_FOR_DATE_FORMAT) - .replace("T", " "); +let lastUpdateDate = formatToMySQLDateTime(new Date());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/webui/imports/api/ingestion/server/publications.js
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/webui/imports/api/ingestion/server/publications.js (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
@@ -209,12 +209,14 @@ def search_and_schedule_new_tasks(db_conn, db_cursor, clp_metadata_db_connection | |||
|
|||
if len(tasks) == 0: | |||
logger.warning(f"No tasks were created for job {job_id}") | |||
update_time = datetime.datetime.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better if we add this logic into update_compression_job_metadata
so whenever we update the job, the update_time
can be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it would be cleaner to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, instead of using Python's datetime.datetime.now()
, let's use NOW()
from SQL to avoid time synchronization issues between the scheduler and the MySQL server.
@@ -55,6 +55,7 @@ def main(argv): | |||
`status_msg` VARCHAR(512) NOT NULL DEFAULT '', | |||
`creation_time` DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3), | |||
`start_time` DATETIME(3) NULL DEFAULT NULL, | |||
`update_time` DATETIME(3) NULL DEFAULT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96 Since there can be frequent sorts, shall we consider building an index on this column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we plan to display based on updatetime, maybe we should add index for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to still order by the job id, but we want to use the last update time, so we just grab to rows that have been updating rather than pushing all the rows overtime we check for updates to display.
hey @haiqi96 , can you help review the package scripts? |
@@ -55,6 +55,7 @@ def main(argv): | |||
`status_msg` VARCHAR(512) NOT NULL DEFAULT '', | |||
`creation_time` DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3), | |||
`start_time` DATETIME(3) NULL DEFAULT NULL, | |||
`update_time` DATETIME(3) NULL DEFAULT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also set DEFAULT CURRENT_TIMESTAMP(3)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends? the creator of this row should be webui? so if the update_time = null means scheduler never updates it, which would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great job
components/webui/imports/ui/IngestView/panels/IngestionJobs/IngestionJobRow.jsx
Show resolved
Hide resolved
@@ -59,7 +59,9 @@ class CompressionDbManager { | |||
( | |||
SELECT * | |||
FROM SelectedColumns | |||
WHERE _id=${jobId} | |||
WHERE | |||
_id=${jobId} && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the AND
operator which is less MySQL specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may no longer apply if you agree with the function body change proposed above.
WHERE _id=${jobId} | ||
WHERE | ||
_id=${jobId} && | ||
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= '${lastUpdateDate}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of comparing against a string date, we can pass in a timestamp and use FROM_UNIXTIME()
?
ref: FROM_UNIXTIME(unix_timestamp[,format])
@@ -23,12 +23,12 @@ class CompressionDbManager { | |||
* Retrieves the last `limit` number of jobs and the ones with the given | |||
* job IDs. | |||
* | |||
* @param {number} limit | |||
* @param {string} lastUpdateDate | |||
* @param {number[]} jobIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we rely on a "last updated" date to check which jobs have been sync'ed, maybe we don't need the mechanism to specifically track any "PENDING" jobs any more. What do you think?
If you agree, we can remove this jobIds
arguments, and the function body can be written as:
const [results] = await this.#sqlDbConnPool.query(`
SELECT
id as _id,
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.STATUS},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.STATUS_MSG},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.START_TIME},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.DURATION},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UNCOMPRESSED_SIZE},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.COMPRESSED_SIZE}
FROM ${this.#compressionJobsTableName}
WHERE update_time >= FROM_UNIXTIME(${lastUpdateTimestamp})
ORDER BY _id DESC;
`);
return results;
@@ -23,12 +23,12 @@ class CompressionDbManager { | |||
* Retrieves the last `limit` number of jobs and the ones with the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this docstring
* Retrieves the last `limit` number of jobs and the ones with the given | |
* Retrieves compression jobs that are updated on or after a specific time. |
@@ -23,12 +23,12 @@ class CompressionDbManager { | |||
* Retrieves the last `limit` number of jobs and the ones with the given | |||
* job IDs. | |||
* | |||
* @param {number} limit | |||
* @param {string} lastUpdateDate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree it's better to pass in a timestamp, here we can write
* @param {string} lastUpdateDate | |
* @param {number} lastUpdateTimestamp in seconds. |
/** | ||
* @type {string} | ||
*/ | ||
let lastUpdateDate = new Date().toISOString() | ||
.slice(0, CONST_FOR_DATE_FORMAT) | ||
.replace("T", " "); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in components/webui/imports/api/ingestion/server/CompressionDbManager.js
, instead of tracking the time as a string, can we store it as a numerical timestamp instead?
If so, we can name it lastUpdateTimestampSeconds
const previousUpdateDate = lastUpdateDate; | ||
lastUpdateDate = new Date().toISOString() | ||
.slice(0, CONST_FOR_DATE_FORMAT) | ||
.replace("T", " "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't be sure how fast computers are / there're any optimizations done by the JS VM, but we might have a delay in getting this current time.
i think a better approach to get the last updated time, which also handles a situation where the Meteor server's time going out sync with the MySQL server, is to SELECT UNIX_TIMESTAMP()
as we perform the query in getCompressionJobs
.
Description
#667
When ingestion jobs are submitted at a fast enough pace the ingestion table in the UI would be missing some of the submitted jobs. The ingestion table would request the most recent 5 jobs when looking for jobs and if job came quicker in the previous time period, not all jobs would be reflected in the UI.
FIX:
Validation performed
Submitted increasingly large number soft background compression jobs and made sure that they were all reflected in the table and that the updated to the status in the ingestion table occur accordingly.
Summary by CodeRabbit
Release Notes
New Features
update_time
column to track the last update timestamp for compression jobs.Improvements
User Interface
The changes provide more granular tracking and visibility into compression job status and history.